1,310 research outputs found
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
An important goal of computer vision is to build systems that learn visual
representations over time that can be applied to many tasks. In this paper, we
investigate a vision-language embedding as a core representation and show that
it leads to better cross-task transfer than standard multi-task learning. In
particular, the task of visual recognition is aligned to the task of visual
question answering by forcing each to use the same word-region embeddings. We
show this leads to greater inductive transfer from recognition to VQA than
standard multitask learning. Visual recognition also improves, especially for
categories that have relatively few recognition training labels but appear
often in the VQA setting. Thus, our paper takes a small step towards creating
more general vision systems by showing the benefit of interpretable, flexible,
and trainable core representations.Comment: Accepted in ICCV 2017. The arxiv version has an extra analysis on
correlation with human attentio
Recommended from our members
Information Brokers: A Comparison of the Web Browser Choices between Internet Users in the US and China
By treating web browsers as information brokers, this dissertation found that the rise of Google Chrome in China and the United States (two countries with vastly different regulations) is contingent on Google and its competitors’ cultural reputations (as suggested by previous research). This dissertation also found that Chrome’s popularity in the US and China is affected by how it is connected to other market entities and popular web services. By examining how a popularly utilized tool is institutionalized in two different countries, this dissertation articulates a new theoretical framework—by combining the sociology of consumption and social network theory—that is more suited to studying online platforms that broker content for internet users
Walverine: A Walrasian Trading Agent
TAC-02 was the third in a series of Trading Agent Competition events fostering research in automating trading strategies by showcasing alternate approaches in an open-invitation market game. TAC presents a challenging travel-shopping scenario where agents must satisfy client preferences for complementary and substitutable goods by interacting through a variety of market types. Michigan's entry, Walverine, bases its decisions on a competitive (Walrasian) analysis of the TAC travel economy. Using this Walrasian model, we construct a decision-theoretic formulation of the optimal bidding problem, which Walverine solves in each round of bidding for each good. Walverine's optimal bidding approach, as well as several other features of its overall strategy, are potentially applicable in a broad class of trading environments.trading agent, trading competition, tatonnement, competitive equilibrium
Collecting The Puzzle Pieces: Disentangled Self-Driven Human Pose Transfer by Permuting Textures
Human pose transfer synthesizes new view(s) of a person for a given pose.
Recent work achieves this via self-reconstruction, which disentangles a
person's pose and texture information by breaking the person down into parts,
then recombines them for reconstruction. However, part-level disentanglement
preserves some pose information that can create unwanted artifacts. In this
paper, we propose Pose Transfer by Permuting Textures (PT), an approach for
self-driven human pose transfer that disentangles pose from texture at the
patch-level. Specifically, we remove pose from an input image by permuting
image patches so only texture information remains. Then we reconstruct the
input image by sampling from the permuted textures for patch-level
disentanglement. To reduce noise and recover clothing shape information from
the permuted patches, we employ encoders with multiple kernel sizes in a triple
branch network. On DeepFashion and Market-1501, PT reports significant
gains on automatic metrics over other self-driven methods, and even outperforms
some fully-supervised methods. A user study also reports images generated by
our method are preferred in 68% of cases over self-driven approaches from prior
work. Code is available at https://github.com/NannanLi999/pt_square.Comment: Accepted to ICCV 202
- …